feat(install): slim install for remote/NIM-only inference on Intel Mac#1830
feat(install): slim install for remote/NIM-only inference on Intel Mac#1830charlesbluca wants to merge 30 commits intoNVIDIA:mainfrom
Conversation
…d macOS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
macOS CI runners (Apple Silicon) can be slow to start the Ray raylet, causing the node to time out during startup. Setting the wait to 120s prevents premature failures on resource-constrained runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guard cv2, sklearn, langdetect, minio, pymilvus, and unstructured_client behind try/except or deferred imports so nemo_retriever and nv_ingest_api can be imported without GPU/local-inference packages installed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pdfium.py: numpy channel-swap fallback when cv2 is None - extract.py: PIL encode fallback for page image rendering without cv2 - transforms.py: PIL fallbacks for numpy_to_base64, base64_to_numpy - table_and_chart.py (api + nemo_retriever): guard bare sklearn DBSCAN imports with ImportError fallback using naive y-grid row grouping - ocr/shared.py: same guard for _blocks_to_pseudo_markdown DBSCAN - lancedb_store.py: skip index creation when no embedding rows written - text_embed/runtime.py: raise on embedding failure instead of silent no-op - graph/executor.py: per-stage debug logging for inprocess pipeline - text_embed/cpu_operator.py: debug logging for embed input/output stats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Windows also supports CUDA torch wheels; extend the platform marker to sys_platform == 'linux' or sys_platform == 'win32' so Windows users get the GPU-enabled wheel from the PyTorch CUDA index rather than falling through to the CPU-only PyPI wheel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR streamlines the
|
| Filename | Overview |
|---|---|
| api/src/nv_ingest_api/util/image_processing/transforms.py | Adds PIL fallbacks for every cv2 code path; minor RGBA compositing mismatch and a vestigial except ImportError remain. |
| nemo_retriever/src/nemo_retriever/text_embed/runtime.py | Converts print/BaseException error handler to proper logger.error/Exception with graceful DataFrame return; correct and intentional behavior change. |
| nemo_retriever/src/nemo_retriever/text_embed/cpu_operator.py | Adds local HF embedder path when no remote endpoint is configured; logic and lazy import pattern are correct. |
| nemo_retriever/src/nemo_retriever/graph/executor.py | Adds Ray 2.49 concurrency=1 compatibility fix and per-stage debug logging; logging import placed inside method body rather than at module level. |
| nemo_retriever/pyproject.toml | Moves GPU/ML deps to [local], audio/SVG to [multimedia], SQL/graph stores to [stores], benchmarks to [benchmarks]; lancedb stays in core; torch index renamed to pytorch-cu130 with correct platform markers. |
| api/src/nv_ingest_api/util/string_processing/init.py | New ensure_openai_embeddings_http_url helper is idempotent and correctly prevents /embeddings/embeddings doubling. |
| .github/workflows/integration-test-library-mode.yml | New nightly integration test for Windows/macOS ARM/Intel; third-party actions still on mutable version tags (previously noted), and no permissions: block is declared. |
| nemo_retriever/src/nemo_retriever/vector_store/lancedb_store.py | Removes redundant try/except around import lancedb inside _write_rows_to_lancedb; adds empty-rows guard in handle_lancedb; import lancedb at module level is appropriate since lancedb is a core dependency. |
| nemo_retriever/src/nemo_retriever/txt/split.py | Fixes docstring placement and adds empty-content early-return in txt_file_to_chunks_df; txt_bytes_to_chunks_df is missing the same guard. |
Sequence Diagram
sequenceDiagram
participant GP as graph_pipeline
participant CPU as _BatchEmbedCPUActor
participant RT as embed_text_main_text_embed
participant HF as create_local_embedder
participant NIM as Remote NIM /embeddings
GP->>CPU: __init__(EmbedParams)
alt embed_invoke_url set
CPU->>CPU: self._model = None
Note over CPU: Remote path
else no endpoint
CPU->>HF: create_local_embedder(model_name, device=cpu)
HF-->>CPU: local HF model
Note over CPU: Local CPU path
end
GP->>CPU: process(df)
CPU->>RT: embed_text_main_text_embed(df, model=self._model, ...)
alt model is not None (local)
RT->>HF: model.embed(texts)
HF-->>RT: vectors
else endpoint is set (remote)
RT->>NIM: POST /embeddings (ensure_openai_embeddings_http_url)
NIM-->>RT: embeddings JSON
end
alt success
RT-->>CPU: out_df (has_embedding=True)
else Exception
RT->>RT: logger.error(..., exc_info=True)
RT-->>CPU: out_df (has_embedding=False, error=str(exc))
end
CPU-->>GP: out_df
Prompt To Fix All With AI
This is a comment left during a code review.
Path: api/src/nv_ingest_api/util/image_processing/transforms.py
Line: 697-718
Comment:
**PIL RGBA fallback composites on black, cv2 path uses white**
When cv2 is unavailable, `Image.open(BytesIO(image_bytes)).convert("RGB")` composites alpha onto a black background (PIL's default), while the cv2 path calls `rgba_to_rgb_white_bg` which blends against white. For document images (which are typically white-background), this produces visually different results in the cv2-free path.
```suggestion
else:
from io import BytesIO
pil_img = Image.open(BytesIO(image_bytes))
if pil_img.mode == "RGBA":
background = Image.new("RGB", pil_img.size, (255, 255, 255))
background.paste(pil_img, mask=pil_img.split()[3])
pil_img = background
else:
pil_img = pil_img.convert("RGB")
img = np.array(pil_img)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: api/src/nv_ingest_api/util/image_processing/transforms.py
Line: 714-715
Comment:
**Vestigial `except ImportError: raise`**
After the refactoring the only potential `ImportError` source inside this `try` block is `from io import BytesIO`, which is a stdlib module that cannot fail. The clause is dead code and may confuse future readers.
```suggestion
except Exception as e:
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/executor.py
Line: 131-133
Comment:
**Module-level logger preferred over inline import**
`import logging as _logging` and the `_exec_log = ...` call are placed inside `execute()` on every invocation. The standard pattern is a single module-level `logger = logging.getLogger(__name__)`. If the goal was to avoid conflicts with an existing name, the existing module-level logger can simply be reused.
Move `import logging` and `logger = logging.getLogger(__name__)` to module scope alongside the existing imports, then replace `_exec_log` references with `logger`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/txt/split.py
Line: 217-220
Comment:
**`txt_bytes_to_chunks_df` missing empty-content guard**
`txt_file_to_chunks_df` received an early return for empty/whitespace-only content (returning a correctly-typed empty DataFrame). `txt_bytes_to_chunks_df` did not get the same treatment. With empty bytes, `raw` will be `""`, `split_text_by_tokens` may return `[]`, and `pd.DataFrame([])` produces a 0-column DataFrame — downstream callers that access `df["text"]` would raise `KeyError`.
```python
raw = content_bytes.decode(encoding, errors="replace")
if not raw or not raw.strip():
return pd.DataFrame(
columns=["text", "path", "page_number", "metadata"],
).astype({"page_number": "int64"})
model_id = tokenizer_model_id or DEFAULT_TOKENIZER_MODEL_ID
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .github/workflows/integration-test-library-mode.yml
Line: 1-10
Comment:
**Missing `permissions:` block**
Per the `github-actions-security` rule, new workflows should declare an explicit least-privilege `permissions:` block. Without it the job inherits the repository's default token permissions. A read-only scope is sufficient for this checkout-and-test workflow:
```yaml
permissions:
contents: read
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (5): Last reviewed commit: "test(embed): align BatchEmbedCPUActor te..." | Re-trigger Greptile
|
|
||
| - name: Set up Python 3.12 | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.12' | ||
|
|
||
| - name: Install uv | ||
| run: pip install uv |
There was a problem hiding this comment.
GitHub Actions not pinned to commit SHA
Both actions/checkout@v4 and actions/setup-python@v5 use mutable version tags. Per the repository's github-actions-security rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks.
| - name: Set up Python 3.12 | |
| uses: actions/setup-python@v5 | |
| with: | |
| python-version: '3.12' | |
| - name: Install uv | |
| run: pip install uv | |
| - name: Check out repository code | |
| uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 | |
| with: | |
| ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }} | |
| - name: Set up Python 3.12 | |
| uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 | |
| with: | |
| python-version: '3.12' |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/integration-test-library-mode.yml
Line: 47-54
Comment:
**GitHub Actions not pinned to commit SHA**
Both `actions/checkout@v4` and `actions/setup-python@v5` use mutable version tags. Per the repository's `github-actions-security` rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks.
```suggestion
- name: Check out repository code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }}
- name: Set up Python 3.12
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: '3.12'
```
How can I resolve this? If you propose a fix, please make it concise.- Wrap bare `import lancedb` in try/except in lancedb_store.py; add None guard in handle_lancedb for [remote]-only installs - Make graph_pipeline.py import of handle_lancedb lazy to avoid startup crash when lancedb is not installed - Add None sentinels for all optional pymilvus/minio/scipy imports in milvus.py (previously only CONSISTENCY_BOUNDED was set) - Guard cv2.INTER_LANCZOS4 and cv2.resize in _resize_image_opencv with PIL fallback when cv2 is unavailable - Restore graceful degradation in embed_text_main_text_embed: return error-payload DataFrame instead of re-raising on embedding failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…izer - Restore handle_lancedb as module-level import in graph_pipeline.py so monkeypatch.setattr works in batch_pipeline tests (safe now that lancedb_store.py guards its own import lancedb) - Early-return in txt_file_to_chunks_df before calling tokenizer when file content is empty, matching the pattern in html/convert.py - Mock _get_tokenizer/_get_txt_tokenizer in txt and html unit tests to avoid hitting HuggingFace (proxy-blocked in this environment) - Skip audio benchmark test gracefully when Ray cannot write to the filesystem (EROFS) rather than failing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move docstrings in txt_file_to_chunks_df and txt_bytes_to_chunks_df to immediately follow the def line so they are recognized as actual docstrings (previously appeared after variable assignments) - Remove conflicts block in pyproject.toml that referenced undefined cpu/cu130 extras; torch wheel selection is already handled by sys_platform markers in [tool.uv.sources] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Merge remote extra into base dependencies (document parsing and NIM client libs are now core; no extra needed for remote-only installs) - Move lancedb into base dependencies (it is the default VDB solution) - Merge local-cpu and local-gpu into a single local extra (GPU assumed) - Drop lancedb from [stores]; update [all] to reference [local] - Remove now-unnecessary lancedb import guards in lancedb_store.py - Update integration-test workflow to install bare nemo_retriever - Update DEPENDENCY_LAYERS.md to reflect new structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace core-only install with [local] extra for local GPU path; add separate remote-only install command for NIM inference users - Reframe Step 2 torch override: clarifies it is a PyPI/CUDA workaround, not a completion of the GPU stack; mark it skip-able for remote users - Add note before pipeline examples that they require [local] + CUDA torch - Fix [svg] extra reference to [multimedia] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Made-with: Cursor # Conflicts: # nemo_retriever/pyproject.toml
- Add ensure_openai_embeddings_http_url in nv-ingest string_processing - Use it in infer_microservice (api + client) and retriever main_text_embed - CPU embed actor: local HF when no remote endpoint instead of default integrate URL Made-with: Cursor
After the embed URL fix, the CPU actor uses a local embedder when no HTTP endpoint is set. Update the unit test to assert that behavior and mock create_local_embedder so the suite does not require HF revisions. Made-with: Cursor
Description
This branch streamlines the default NeMo Retriever install and dependency story, and tightens optional vs core packages in
nemo_retriever/pyproject.toml.Dependency and install model
nv-ingestis no longer a direct dependency ofnemo-retriever; editable installs still wirenv-ingest-apiandnv-ingest-clientfrom the monorepo paths.[local](GPU-oriented torch, transformers, Nemotron model packages, vLLM on Linux, etc.),[multimedia](audio/ASR + SVG viacairosvg),[stores](DuckDB and Neo4j),[benchmarks](datasets / open-clip), and[all]as a convenience meta-extra.pytorch-cu130) matches that layout.uv.lockis regenerated to match the slimmer dependency graph.Docs
nemo_retriever/README.md: clearer split between remote NIM-only vs local GPU setups; documents[local]and the CUDA 13 torch override; SVG install now points at[multimedia](with a directcairosvgalternative).Runtime and API behavior
/embeddingson HTTP embedding URLs (delegates tonv_ingest_apihelper). Tests aligned with the local-HF default when no remote URL is configured.CI
nemo_retriever[all,dev]instead of separate editablesrc/api/clientplusnemo_retriever.integration-test-library-mode.yml: library-mode graph pipeline on Windows and macOS (ARM and Intel), usingNVCF_API_KEYfor remote NIM endpoints.Misc
Checklist